Boosting Random Forests to Reduce Bias; One-Step Boosted Forest and its Variance Estimate

نویسندگان

Indrayudh Ghosal

Giles Hooker

چکیده

In this paper we propose using the principle of boosting to reduce the bias of a random forest prediction in the regression setting. From the original random forest fit we extract the residuals and then fit another random forest to these residuals. We call the sum of these two random forests a one-step boosted forest. We have shown with simulated and real data that the one-step boosted forest has a reduced bias compared to the original random forest. The paper also provides a variance estimate of the one-step boosted forest by an extension of the infinitesimal Jackknife estimator. Using this variance estimate we can construct prediction intervals for the boosted forest and we show that they have good coverage probabilities. Combining the bias reduction and the variance estimate we have shown that the one-step boosted forest has a significant reduction in predictive mean squared error and thus an improvement in predictive performance. When applied on datasets from the UCI database we have empirically proven that the one-step boosted forest performs better than the random forest and gradient boosting machine algorithms. Theoretically we can also extend such a boosting process to more than one step and the same principles outlined in this paper can be used to find variance estimates for such predictors. Such boosting will reduce bias even further but it risks over-fitting and also increases the computational burden.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500

In recent years, machine learning research has gained momentum: New developments in the field of deep learning allow for multiple levels of abstraction and are starting to supersede wellknown and powerful tree-based techniques mainly operating on the original feature space. All these methods can be applied to various fields, including finance. This article implements and analyses the effectiven...

متن کامل

Linear and Nonlinear Trading Models with Gradient Boosted Random Forests and Application to Singapore Stock Market

This paper presents new trading models for the stock market and test whether they are able to consistently generate excess returns from the Singapore Exchange (SGX). Instead of conventional ways of modeling stock prices, we construct models which relate the market indicators to a trading decision directly. Furthermore, unlike a reversal trading system or a binary system of buy and sell, we allo...

متن کامل

Interpreting Tree Ensembles with inTrees

Tree ensembles such as random forests and boosted trees are accurate but difficult to understand, debug and deploy. In this work, we provide the inTrees (interpretable trees) framework that extracts, measures, prunes and selects rules from a tree ensemble, and calculates frequent variable interactions. An rule-based learner, referred to as the simplified tree ensemble learner (STEL), can also b...

متن کامل

Bias-corrected random forests in regression

This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date...

متن کامل

An Empirical Comparison of Supervised Learning Algorithms Using Different Performance Metrics

We present results from a large-scale empirical comparison between ten learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. We evaluate the methods on binary classification problems using nine performance criteria: accuracy, squared error, cross-entropy, ROC Area, F-score, p...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

Boosting Random Forests to Reduce Bias; One-Step Boosted Forest and its Variance Estimate

نویسندگان

چکیده

منابع مشابه

Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500

Linear and Nonlinear Trading Models with Gradient Boosted Random Forests and Application to Singapore Stock Market

Interpreting Tree Ensembles with inTrees

Bias-corrected random forests in regression

An Empirical Comparison of Supervised Learning Algorithms Using Different Performance Metrics

عنوان ژورنال:

اشتراک گذاری